This vignette shows how to use the basic functionalities of the gplite package. We’ll start with a simple regression example, and then illustrate the usage with non-Gaussian likelihoods. This vignette assumes that you’re already familiar Gaussian processes (GPs), and so we shall not focus on the mathematical details. To read more about GPs, see Rasmussen and Williams (2006).

1 Regression

library(gplite)
library(ggplot2)
theme_set(theme_classic())

Let us simulate some toy regression data.

# create some toy 1d regression data
set.seed(32004)
n <- 200
sigma <- 0.1
x <- rnorm(n)
y <- sin(3*x)*exp(-abs(x)) +  rnorm(n)*sigma 
ggplot() + 
  geom_point(aes(x=x,y=y), size=0.5) + 
  xlab('x') + ylab('y') + xlim(-4,4)

Set up the model with squared exponential covariance function and Gaussian likelihood. With a Gaussian observation model, we can compute the posterior analytically for a given set of hyperparameters.

So let’s first create he model. We’ll use the squared exponential covariance function for this example.

gp <- gp_init(cfs = cf_sexp(), lik = lik_gaussian())

There’s three hyperparameters in the model. We can see the initial values using get_param. Notice that this function returns the parametes on log-scale, so we take the exponent to get the actual values

exp(get_param(gp))
##     cf_sexp.lscale       cf_sexp.magn lik_gaussian.sigma 
##                0.3                1.0                0.5

We can optimize the hyperparameters to the maximum marginal likelihood solution using gp_optim.

gp <- gp_optim(gp, x, y)

Let’s print out the hyperparameters after optimization:

exp(get_param(gp))
##     cf_sexp.lscale       cf_sexp.magn lik_gaussian.sigma 
##         0.42449158         0.33185388         0.09841499

We can easily make predictions with the fitted model using function gp_pred. So let’s predict and visualize the results

# compute the predictive mean and variance in a grid of points
xt <- seq(-4,4,len=300)
pred <- gp_pred(gp, xt, var=T)

# visualize
mu <- pred$mean
lb <- pred$mean - 2*sqrt(pred$var)
ub <- pred$mean + 2*sqrt(pred$var)
ggplot() + 
  geom_ribbon(aes(x=xt, ymin=lb, ymax=ub), fill='lightgray') +
  geom_line(aes(x=xt, y=mu), size=1) +
  geom_point(aes(x=x, y=y), size=0.5) +
  xlab('x') + ylab('y')

2 Binary classification

# create 1d toy binary classification data
set.seed(32004)
n <- 150
sigma <- 0.1
x <- rnorm(n)
ycont <- sin(3*x)*exp(-abs(x)) +  rnorm(n)*sigma
y <- rep(0,n)
y[ycont > 0] <- 1

ggplot() + 
  geom_point(aes(x=x,y=y), size=0.5) + 
  xlab('x') + ylab('y')

Set up the model and optimize the hyperparameters. Notice that due to non-Gaussian likelihood, the posterior is not analytically available, so this will use Laplace approximation

gp <- gp_init(cfs = cf_sexp(), lik = lik_bernoulli())
gp <- gp_optim(gp, x, y)

Let us visualize the predictive fit. For a non-Gaussian likelihood, this is most conveniently done by drawing from the predictive distribution using gp_draw, and then computing the intervals from the obtained draws.

# predict in a grid of points
xt <- seq(-4,4,len=200)
pred <- gp_draw(gp, xt, draws=1000)
pred_mean <- rowMeans(pred)
qt <- apply(pred, 1, quantile, c(0.05, 0.95))
pred_lb <- qt[1,]
pred_ub <- qt[2,]

ggplot() + 
  geom_ribbon(aes(x=xt,ymin=pred_lb,ymax=pred_ub), fill='lightgray') +
  geom_line(aes(x=xt,y=pred_mean), color='black', size=1) +
  geom_point(aes(x=x,y=y), color='black', size=0.5) +
  xlab('x') + ylab('y')

The fit looks reasonable, but one could expect that the predictive probabilities would be even closer to zero and one around the origin, given the data we see. Indeed, it is well known that the Laplace approximation tends to be too conservative and underestimate the extreme probabilities in this sort of cases (see, for example, Kuss and Rasmussen 2005).

For more accurate inference, we can run MCMC for the latent values with the optimized hyperparameters.

gpmc <- gp_mcmc(gp, x, y, iter=1000, chains=2)

Let’s visualize the fit again

# predict in a grid of points
xt <- seq(-4,4,len=200)
pred <- gp_draw(gpmc, xt)
pred_mean <- rowMeans(pred)
qt <- apply(pred, 1, quantile, c(0.05, 0.95))
pred_lb <- qt[1,]
pred_ub <- qt[2,]

ggplot() + 
  geom_ribbon(aes(x=xt,ymin=pred_lb,ymax=pred_ub), fill='lightgray') +
  geom_line(aes(x=xt,y=pred_mean), color='black', size=1) +
  geom_point(aes(x=x,y=y), color='black', size=0.5) +
  xlab('x') + ylab('y')

As we see, the fit is substantially different in those cases where the predictive probability is close to zero or one. It is worthwhile to note that even though the Laplace approximation is not very accurate in the extreme cases, it gives reasonable estimates for the hyperparameters. Running then MCMC for the latent values given the hyperparameters gives more accurate estimates for the latent function values.

References

Kuss, Malte, and Carl Edward Rasmussen. 2005. “Assessing Approximate Inference for Binary Gaussian Process Classification.” Journal of Machine Learning Research 6: 1679–1704.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2006. Gaussian Processes for Machine Learning. The MIT Press. http://www.gaussianprocess.org/gpml/.